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1. Project Description 

The research supported under this award focuses on heterogeneous distributed com- 
puting for high-performance applications, with particular emphasis on computational 
aerosciences. The overall goal of this project was to and investigate issues in, and 
develop solutions to, efficient execution of computational aeroscience codes in hetero- 
geneous concurrent computing environments. In particular, we worked in the context of 
the PVM[1] system and, subsequent to detailed conversion efforts and performance 
benchmarking, devising novel techniques to increase the efficacy of heterogeneous 
networked environments for computational aerosciences. Our work has been based upon 
the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the 
NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize 
our research accomplishments under the auspices of the grant. 


2. Research Accomplishments 

The first phase of the project involved converting and coding the five NAS NPB 
kernels to the PVM system and experimentally investigating their performance in three 
different cluster environments. These results were very well received and have been 
detailed in publications [3], [4], and [5]. In addition, the benchmarks were made publicly 
available to the research community and over 175 requests for the software were received 
and honored. In this period, the NAS NPB benchmark specifications evolved with the 
addition of a new class of problem size and a few minor modifications. In the next phase, 
we focused on three major activities, namely: (a) Revision of the PVM versions of the 
NAS codes, and inclusion of the three simulated applications to construct a full comple- 
ment of PVM NAS benchmarks; (b) Development of a multithreaded framework for 
PVM — a new research effort oriented towards enhancing the efficiency of the NAS 
benchmarks and other scientific applications by enabling overlapped communications 
and computation, and by enabling smaller granularity without loss of performance; and 
(c) Development of a parallel I/O interface for PVM to support large parallel applications 
that have high I/O bandwidth. Towards the end of the project, we further revised and 
refined the PVM substrate, and also conducted a new suite of performance benchmarks 
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on high-end workstations interconnected by high speed ATM and 100-Mb Ethernet net- 
works, and have obtained contemporary results for NPB codes on cluster computing sys- 
tems. 

Brief description of work 

Our tasks in this project included the conversion, enhancement, and optimization of 
the NPB kernels to execute under PVM, to experiment with the PVM versions of the 
NPB codes, to develop a full benchmark suite by adding the BT, SP, and LU bench- 
marks, and to analyze performance and efficiency aspects. This work was performed by 
the PI (Vaidy Sunderam) and by Steve Moyer and Soeren Olesen (postdoctoral fellows) 
and by Adam Ferrari, Nancy Sedora, and Xiaowu Lu, Emory graduate students partially 
supported under this award. These results are described in papers listed below as well as 
graduate theses [6], [7]. The software for the NPB benchmarks and PIOUS for the NHT 
benchmarks have been placed in the public domain, and in the time since release, over 
175 requests for the NPB software and over 120 requests for the PIOUS software have 
been received. Work on these conversions and experiments have continued over the past 
several months, and recently, benchmark results have been consolidated and published in 
a technical report which will be sent to a journal or conference for public dissemination. 

To experimentally investigate novel methods of improving the efficacy of network 
based concurrent computing, a threads-based system for PVM was designed and 
developed. The TPVM (Threads-oriented PVM) system is an experimental auxiliary 
subsystem for the PVM distributed system, which supports the use of lightweight 
processes or “threads” as the basic unit of parallelism and scheduling. TPVM provides a 
library interface which presents both a traditional, task based, explicit message passing 
model, as well as a data-driven scheduling model that enables straightforward 
specification of computation based on data dependencies. The TPVM system comprises 
three basic modules: a library interface that provides access to thread-based distributed 
concurrent computing facilities, a portable thread interface module which abstracts the 
required threads-related services, and a thread server module which performs scheduling 
and system data management. Our design is still under development, but a prototype 
implementation has allowed us to perform a number of preliminary experiments. These 
have provided strong evidence that TPVM can offer improved performance, processor 
utilization, and load balance to several application categories. The research results of the 
TPVM system are included in [7], recently submitted for publication. 

In a related effort aimed at addressing the data handling and I/O requirements of 
large parallel applications, we have also been working on the PIOUS parallel I/O system 
under the auspices of this award. This work is motivated by the fact that most 
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metacomputing environments either provide no I/O facilities, or serialize all I/O requests 
and thus limit application performance and scalability. PIOUS is a parallel file system 
designed to incorporate true parallel I/O into existing metacomputing environments. 
Because PIOUS is itself a parallel application, the functionality of a given metacomput- 
ing environment is extended without modification. Two papers describing the PIOUS 
software architecture and programming model as well as preliminary results from a pro- 
totype PIOUS implementation have been written[8],[9]. Performance results are 
encouraging, demonstrating the potential of the PIOUS architecture for achieving scal- 
able file system bandwidth in a network computing environment. 

The two subsystems outlined above were worked on to improve the performance 
and functionality of the PVM network computing system as part of our ongoing research 
in heterogeneous computing. Several of the ideas and implementation techniques were 
influenced by the requirements of computational aeroscience applications, as exemplified 
by the NPB kernels and application benchmarks. Simultaneously with these efforts, we 
also continued to fine tune and optimize the NPB implementations for PVM as well as 
the NHT-1 I/O benchmarks. We have conducted systematic measurements of these at 
several stages during the course of the project, on machines and networks that varied in 
capabilities and speed. Two such sets of results have been reported in technical reports, 
the second one just recently. These efforts indicate the viability of network and cluster 
computing for high performance computational science applications and for other 
scientific computing codes, and highlight the constraints and conditions under which 
sub-optimal as well as near-optimal performance is likely to be achieved. 

Deliverables produced 

At the end of the project, our research supported by this award has produced as 
deliverables: 

• A complete suite of the 8 NAS parallel benchmarks for the PVM system, including 
the 5 kernels and 3 simulated applications, involving over 23,000 lines of code. 
Both versions together have been distributed to over 175 requestors since the begin- 
ning of this project. 

• A preliminary version of the TPVM system for threads-based concurrent computing 
in PVM has been designed and developed. Initial results are very encouraging and 
several external research groups have been given the software to facilitate their own 
research. 
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• A beta version of the PIOUS parallel I/O system has been completed and has been 
released. Over 120 copies of the distribution have been given out at the time of writ- 
ing of this report. 

Research Output 

In terms of research and education, the following have been accomplished under the 
auspices of this award: 

Experimental research data concerning the use of cluster computing environments 
for computational aeroscience applications have been measured and published. 

Novel systems level enhancements to network computing systems have been 
developed and incorporated into the PVM heterogeneous computing system. 

Two postdoctoral fellows, Steven Moyer and Soeren Olesen, have had their post- 
doctoral training in aspects of this project. 

Three graduate students, Adam Ferrari, Nancy Sedora, and Xiaowu Lu have been 
partially supported by this grant and have completed Masters’ theses. 

Eleven papers have been written as a result of work performed under this grant; ten 
have been accepted, and ones just been completed and will be submitted for publica- 
tion. 

This work has resulted in four invited lectures by the PI, and three contributed talks 
at conferences. 
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